PHIS, an Information System for Phenotyping

The project involved development work on the PHIS Information System, a platform dedicated to phenotyping data, with a focus on integrating a workflow manager. The mission took place between September 2021 and August 2022, in collaboration with the LEPSE laboratory (Laboratoire d'Ecophysiologie des Plantes sous Stress Environnementaux). The development was supervised by the Information Systems Department (DSI) of INRAE (National Research Institute for Agriculture, Food and Environment), with the UMR (Joint Research Unit) LEPSE responsible for scientific and business aspects.

The challenges were multiple: ensuring the necessary maintenance for the platform's proper functioning while working on the integration of a workflow manager, a crucial aspect for guaranteeing the accessibility and use of public scientific resources. The development was supervised by the Information Systems Department (DSI) of INRAE (National Research Institute for Agriculture, Food and Environment), with the UMR (Joint Research Unit) LEPSE responsible for scientific and business aspects.

Tasks & Objectives

As a fullstack developer, my role involved developing and debugging both the frontend and backend of the code repository, while working on the integration of a workflow manager to enable the automation of data processing pipelines. One of the main objectives was to ensure smooth user journeys, particularly for data deposition and retrieval on the platform.

Success criteria included not only bug fixes and application maintenance but also complete integration of the workflow manager aligned with the platform's technologies, particularly API Platform from Symfony. A key objective was to decouple workflow management work from developers, allowing scientists to define their own workflows. Finally, it was essential to develop robust end-to-end tests.

Actions and Development

My first step was to familiarize myself with the PHIS environment, including a Symfony-developed backend and an Angular frontend. I then created a specific repository to host the workflow manager and set up a pipeline to extract content from this repository to automatically generate annotations for API Platform. For end-to-end tests, I used Robot Framework with the Selenium library.

Regular exchanges with the project, scientific, and IT teams, as well as with the former development team, facilitated my work. Collaboration with the LEPSE UMR was crucial for developing a common workflow manager, establishing a shared vocabulary. Despite the project's complexity and significant technical debt, implementing the workflow manager represented a major challenge but also a learning opportunity.

Key decisions were made collectively during bi-weekly meetings. For the workflow manager integration, I presented a Proof of Concept (POC) before implementing the complete solution.

Results

The results are multiple: correction of numerous bugs, improvement of user feedback and ergonomics, evolution of the model for managing workflows, and integration of the workflow manager using JSON-LD format, compliant with a scientific ontology. A complex CI (continuous integration) was co-built to synchronize the workflow manager with the API. The repository for this ETL (extract, transform, load) is available here. Additionally, the end-to-end tests implemented with Robot Framework cover all key user journeys.

I learned to master the Symfony framework and PHP language in depth, to work in a team with a clean CI, and to use Robot Framework with Selenium. Finally, the custom ETL work, transversal between API Platform and web semantic ontology formats (RDF/OWL), strengthened my technical skills.

Technical Stack

The project relies on the following tools and technologies:

  • Backend : PHP, Symfony
  • Frontend : TypeScript, Angular
  • Tests : Robot Framework, Selenium
  • Infrastructure : Docker Compose
  • Documentation : Markdown
  • ETL custom : Node.js for ETL between workflow manager files and Symfony

It is important to note that this technical stack was inherited from the existing PHIS system. The major technical challenges encountered include:

  • Complex inherited code, particularly a 600-line function with multiple levels of conditions and nested loops
  • Integration of workflow managers in an existing scientific information system